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(54) OCR SYSTEM AND METHOD FOR GENERATING READ CONTROL INFORMATION TO BE 
APPLIED TO THE SYSTEM 

(57)Abstract: 

PROBLEM TO BE SOLVED: To make effective the generation 
processing of FC information by automatically executing definition 
processing of an effective field frame on a document when the FC 
information is subjected to generation processing. 
SOLUTION: In an OCR system having a generation function for FC 
information 30, a PC main body 2 extracts a field frame from image data 
obtained from a slip by a scanner 1 and then, defines an effective field 
frame as a read object by referring to field format dictionary information 
32 that is preliminarily registered with an HDD 3. Information for field 
frame recognition for recognizing the effective field frame from the 
document of a read object is set in the FC information 30 according to the 
definition processing of the effective field frame. 
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* NOTICES * 

Japan Patent Office is not responsible for any 
damages caused by the use of this translation. 

l.This document has been translated by computer. So the translation may not reflect the original precisely. 
2 **** s hows the word which can not be translated. 
3. In the drawings, any words are not translated. 



CLAIMS 



[Claim(s)] 

[Claim 1] In the field extracted by the photo-electric-translation means for changing the recording information 
containing a field frame into image data from the document for reading, the extraction means for extracting the 
aforementioned field frame from the aforementioned image data, and the aforementioned extraction means OCR 
system characterized by providing the generation means for generating the information for a field frame recognition 
that set up an effective field frame and field frames other than the concerned effective field frame are deleted as a 
reading object, and the registration means for registering the aforementioned information for a field frame 
recognition. 

[Claim 2] The aforementioned generation means is an OCR system according to claim 1 characterized by 
generating the aforementioned information for a field frame recognition with reference to the format-of-field 
dictionary information that the effective field frame was defined. 

[Claim 3] The aforementioned registration means is an OCR system according to claim 1 characterized by 
registering the aforementioned information for a field frame recognition as a part of reading control information 
used at the time of reading processing of the aforementioned document. 

[Claim 4] The recording information containing a field frame is changed into image data from the document for 
reading. The step which is the creation technique of the reading control information applied to OCR system which 
performs character reading processing from the concerned image data based on the reading control information 
registered beforehand, and extracts the aforementioned field frame from the aforementioned image data, The step 
which generates the information for a field frame recognition that set up an effective field frame and field frames 
other than the concerned effective field frame are deleted as a reading object in the field frame extracted by the 
aforementioned extraction step, The creation technique of reading control information of performing processing 
which consists of a step which sets up the aforementioned information for a field frame recognition into the 
aforementioned reading control information. 

[Claim 5] The recording information containing a field frame is changed into image data from the document for 
reading. Processing which is the storage which can be read and extracts the aforementioned field frame from the 
aforementioned image data by computer formed in OCR system which performs character reading processing from 
the concerned image data, Processing which generates the information for a field frame recognition that set up an 
effective field frame and field frames other than the concerned effective field frame are deleted as a reading object 
in the field frame extracted by the aforementioned extraction step, The storage which memorized the program set up 
so that the aforementioned computer might perform processing which registers the aforementioned information for 
a field frame recognition as a part of reading control information required for the aforementioned character reading 
processing. 
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DETAILED DESCRIPTION 



[Detailed Description of the Invention] 
[0001] 

[The technical field to which invention belongs] this invention relates to the creation function of reading control 
information especially about OCR system which performs character reading processing from a document using an 
optical character reader. 
[0002] 

[Description of the Prior Art] Conventionally, it is necessary to use an optical character reader (OCR), and to create 
and register the reading control information called format control information (FC information) in OCR system for 
carrying out reading processing of the character recorded on the document. FC information is managed by computer 
(personal computer) which constitutes the control unit (OCR control section) of a system, and is registered into the 
hard disk drive (HDD) by usual. 

[0003] OCR control section recognizes the character string which is a reading object from the image data which 
was scanned with the scanner (photo : electric-translation sensor), and was obtained, and has managed FC 
information for specifying a character kind required for recognition processing for every character of the concerned 
character string etc. Here, on the document used as a reading object, as shown in drawing 8 (A), the letter face for 
pinpointing a reading subject name (an identifier, a purchase name of article, amount of money) and the field 
(reading field) for every item is recorded. Each letter face is printed by the drop out color usual. The information for 
a field recognition for recognizing the field (these being named generically below and it being written as a field 
frame) which compounded each letter face and each letter face is required for FC information. 
[0004] By the way, OCR control section incorporates an actual document image (image data) from a scanner as a 
method which creates the information for a field recognition on FC information, and there is a method of extracting 
the above-mentioned field (candidate) from the concerned image data. In this case, the scanner which also 
recognizes a drop out color as black data is used, and OCR control section detects each letter face arranged at the 
constant pitch shown in drawing 8 (A). When each of this letter face is compounded, as shown in drawing 8 (B), the 
idea like fields A, B, and C can be assumed. 
[0005] 

[Problem(s) to be Solved by the Invention] As mentioned above, OCR control section creates the information for a 
field recognition for recognizing the field frame which is a reading field from an actual document image, and 
registers it as a part of FC information. In addition, FC informations other than the information for a field 
recognition are set up by usual based on the information inputted by the user. 

[0006] By the way, in the case of the document of the simple format which is shown in drawing 8 (A), the 
information for a field recognition can be automatically created by the above-mentioned technique. However, by 
the complicated document of a format which is shown, for example in drawing 9 , a frame unnecessary as reading 
processing exists in addition to the field frame (fields A, B, C, and D) which is an actual reading field. That is, as 
shown in drawing 10 , the letter face of the reading subject name [ itself] (an identifier, a purchase name of article, 
amount of money), the frame with which "**" of for example, a kana character has been incorrect-recognized turn 
into an unnecessary frame. 

[0007] OCR control section will form a field automatically from the continuity of a letter face, and as finally shown 
in drawing 11 , it will set up the field frame shown with a dashed line. However, as shown in drawing 12 , the field 
frames which are actual reading fields are only the fields A, B, C, and D shown with a thick dashed line. For this 
reason, the work which chooses only the frame (fields A, B, C, and D) which deletes all the informations equivalent 
to the frame of a thin dashed line, or is shown with a thick dashed line from the information for a field recognition 
created automatically by operation of an user is needed. Therefore, when creating FC information over the 
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document of a complicated format, in order to set up the effective information for a field recognition, troublesome 
operation work of an user is needed. 

[0008] Then, it is shown in attaining the increase in efficiency of creation processing of FC information as a result 
at the time of creation processing of FC information, as the purpose of this invention can perform automatically 
definite processing of the effective field frame on a document. 
[0009] 

[Means for Solving the Problem] this invention is the system equipped with the generation means for generating the 
extraction means for extracting a field frame from the image data obtained from the document, and the information 
for a field frame recognition that set up an effective field frame and field frames other than the concerned effective 
field frame are deleted as a reading object in the extracted field in OCR system which has the creation function of 
FC information. 

[0010] Specifically, this system extracts a field frame from an actual document image, and when generating the 
information for a field frame recognition for recognizing an effective field frame as a reading object, it generates 
the information for a field frame recognition that the unnecessary field frame was deleted. This information for a 
field frame recognition is registered as a part of FC information. The aforementioned generation means deletes 
unnecessary field frames other than an effective field frame with reference to the field format dictionary 
information used as the criteria for setting up an effective field frame. 

[001 1] The effective information for a field recognition can be created and registered, without an user's deleting all 
unnecessary field frames, or needing troublesome operation which chooses an effective field frame, when creating 
and registering FC information by such configuration. 
[0012] 

[Embodiments of the Invention] With reference to a drawing, the gestalt of operation of this invention is explained 
below. 

(System configuration) View 1 is the block diagram showing OCR structure of a system related to this operation 
gestalt. This system consists of a scanner 1, the mainframe (mainframe of PC) 2 of a personal computer, HDD3, 
character recognition equipment 4, and I/O device 5, as shown in drawing 1 . A scanner 1 scans the document top 
for reading, carries out the photo electric translation of the recording information, such as a character string and a 
field frame, and outputs image data. 

[0013] The mainframe 2 of PC demonstrates the function of OCR control section containing the creation function 
of FC information related to this operation gestalt by performing software for OCR control. HDD3 is external 
storage by which an access control is carried out with the mainframe 2 of PC, and stores the FC information 30 and 
the format-of-field dictionary information (FF dictionary information) 32 which are later mentioned with the 
aforementioned software for OCR control. 

[0014] The character recognition equipment 4 performs recognition processing for the character string (image data) 
on the document incorporated with the scanner 1 per 1 character based on a control of the mainframe 2 of PC. In 
addition, the notional equipment by the software for character recognition which the mainframe 2 of PC performs in 
addition to an isolated system may be used for the character recognition equipment 4. I/O devices 5 are a keyboard 
for inputting various kinds of data and commands into the display unit and the mainframe 2 of PC for carrying out 
the display output of the document image and FC information which were incorporated with the scanner 1, the 
character recognition result, etc., and an input unit containing a mouse. 

(Creation processing of FC information) With reference to drawing 7 , creation processing of FC information on 
this operation gestalt is explained from drawing 2 with view 1 below. 

[0015] This operation gestalt is related with processing which creates the information for a field frame recognition 
3 1 for recognizing a field frame in FC information. Here, as an information about a format of the field frame 
recorded on the document (printing), as shown in drawing 2 (A) - (E), there are "existence of a synthetic field (field 
which compounded each letter face by the number of digits)", "character width of face", a "character height", a 
"character pitch", and a "field width." Furthermore, as an information about a format, there are a "number of digits", 
a "line type", "point width of face", and "****." Here, a "line type" means the line types (a dashed line, an alternate 
long and short dash line, two point chain line, etc.) of a frame. Moreover, "point width of face" means the width of 
face of a closing line. "****" means the color of a closing line. 

[0016] At the time of creation processing of FC information on this operation gestalt, an actual document which is 
shown in drawing 4 is prepared. The mainframe 2 of PC inputs a document image as image data from a scanner 1 
(step SI). The mainframe 2 of PC extracts two or more field frames (a frame - k frame) from image data by frame 
logging processing, as shown in drawing 4 (step S2). By this extraction processing, as shown in drawing 5 , the 
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mainframe 2 of PC generates the attribute information for every extracted field frames a-k. That is, the field frames 
a-h consist of an independent field which does not constitute a synthetic field. Here, in fact, field frame a makes a 
kana character "**" a letter face, and incorrect-recognizes it. On the other hand, the field frames i-k are the fields 
which can be recognized as a synthetic field which compounded the letter face of a number of digits "5." 
[0017] Next, the mainframe 2 of PC performs processing for deciding the effective field frame as a reading object 
in each aforementioned field frames a-k extracted before creation processing with reference to FF dictionary 
information 32 registered into HDD3 (step S3). FF dictionary information 32 consists of a format information 
which defined the effective field frame as a reading object, as shown in drawing 3 . With this operation gestalt, the 
case where each information distinguished in the synthetic field frame and the non-compounding field (independent 
field which consists of one letter face) A type and B type is prepared as an FF dictionary information 32 is assumed. 

[0018] Specifically, the mainframe 2 of PC deletes an unnecessary field frame from the field frame which 
performed and extracted the following definite processings with reference to FF dictionary information 32 and the 
attribute information (see the drawing 5 ) for every field frames a-k (step S4). That is, each extracted field frames a- 
k judge whether it corresponds to which a type A type of FF dictionary information 32, and B type. Since definition 
informations, such as a "character height", are not included in an applicable domain about each extracted field 
frames a-g as compared with the format information on B type which is a non-compounding field, the mainframe 2 
of PC is judged to be an unnecessary field frame. Moreover, since each definition information on a "character 
height", a "field width", a "line type", and "line breadth" is included in an applicable domain about field frame h 
which is a non-compounding field, it judges as an effective B type field frame. On the other hand, since all of each 
definition informations are contained in an applicable domain in A type of FF dictionary information 32 about the 
field frames i-k which are synthetic fields, it judges as an effective field frame as a reading object. 
[0019] The mainframe 2 of PC is read in each extracted field frames a-k, by such definite processing, as an object, 
deletes even the unnecessary field frames a-g, and decides the effective field frames i-k as a format effective field 
frame h and A type as a B type format. That is, as shown in drawing 7 , in a document image, the field frames h-k 
of a thick line are decided as an effective field as a reading object, and the other field frame is deleted. And the 
mainframe 2 of PC registers the format information decided as an effective field frame as a reading object as an 
information for a field frame recognition 3 1 on the FC information 30 (step S5). 

[0020] As mentioned above, according to this operation gestalt, after extracting a field frame from an actual 
document image, the effective field frame as a reading object can be automatically decided by referring to the 
format-of-field dictionary information 32 registered beforehand, therefore, an user — ah extraction — an unnecessary 
field frame can be deleted from a **** field frame the bottom, or troublesome operation work which chooses only 
an effective field frame can be omitted The creation processing luminous efficacy of FC information containing the 
information for this recognizing the field frame which has set up the reading field on a document can be improved 
sharply. In addition, about FC informations 30 other than information 3 1 for a field frame recognition, usually, then 
based on the document format for reading (an information required for character recognition processings, such as a 
character kind, is included), an user operates an input unit and it registers with HDD3. 

[0021] In addition, although this operation gestalt explained the case where creation processing of FC information 
was performed by the software in which it was stored by HDD3, a configuration which is set to HDD3 from 
exchanged type storages, such as a floppy disk and an optical disk, is sufficient as the concerned software. 
[0022] 

[Effect of the Invention] As explained in full detail above, according to this invention, in OCR system which has 
the creation function of FC information, definite processing of the effective field frame on a document can be 
automatically performed at the time of creation processing of FC information. Therefore, when creating and 
registering FC information, an user can delete all unnecessary field frames, or troublesome operation work which 
chooses an effective field frame can be omitted. Thereby, efficient creation processing of FC information required 
for reading processing of a document as a result is realizable. 
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DESCRIPTION OF DRAWINGS 



[Brief Description of the Drawings] 

[Drawing 1] The block diagram showing OCR structure of a system related to the operation gestalt of this 
invention. 

[Drawing 2] The conceptual diagram for this operation gestalt explaining the format information on a field frame. 
[Drawing 3] The conceptual diagram showing an example of a format-of-field dictionary information related to this 
operation gestalt. 

[Drawing 4] The conceptual diagram showing the document image used for FC creation processing of this 
operation gestalt. 

[Drawing 5] The conceptual diagram showing the format information generated at the time of extraction processing 
of the field frame of this operation gestalt. 

[Drawing 6] It is the flow chart of ** in order to explain creation processing of FC information on this operation 
gestalt. 

[Drawing 7] The conceptual diagram showing the document image decided by definite processing of the field frame 
of this operation gestalt. 

[Drawing 8] The conceptual diagram showing the document image used by the conventional OCR system. 
[Drawing 9] The conceptual diagram for explaining extraction processing of the field frame in the conventional 
OCR system. 

[Drawing 10] The conceptual diagram for explaining extraction processing of the field frame in the conventional 
OCR system. 

[Drawing 11] The conceptual diagram for explaining extraction processing of the field frame in the conventional 
OCR system. 

[Drawing 12] The conceptual diagram for explaining extraction processing of the field frame in the conventional 
OCR system. 
[Description of Notations] 

1 — Scanner 

2 — Mainframe of a personal computer (OCR control section) 

3 - Hard disk drive (HDD) 

4 - Character recognition equipment 

5 - I/O device 
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fa.-*** (PC**) 2<b, HDD 3 ir , 
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•fa. 

[.ooi3] pc**2i±, ocR^j»ffly7 h*^r 
«r»Ti-*ii^J:9, K***«^B8«1-5FC»« 
«>#JMB«rfttrOCRaitP»«)«ttfrBBrfi. HD 

ftJfii-« FCtfft 3 0Jkff7-f-/l/K7*- 7jy hS?§ 
AMI (FFff*«f» 3 2 
[0014] 3t^B«»R4l±, PC**20fflWJwS 

(Mfer-*) £ l ^¥frcmtta&JtfT -f£ 0 ft 

fcMW*-^ FCttf«^X*BKS*4ifS:**ffi 
MSfc^r^yKgl, PC**2lc&fi 

**StrA;&»BT*>S. 
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[0 0 15] PIKffiJgffitt* FCM^t'7>f^K 
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+*»s^bbi-5. ::-c, Slices om) $n 

8 2 (A) - (E) Ic^l-^IC, 
(#**»«:«»»«ft*rt Lfc 7 -f -yu K) <D# 

, rjr^gj % rx^KSj , rat****-] % 

r^-f-yuKBj d<*>5. if)C, 7*— ^yhteBBi- 

fej #** 0 cr-e, r^sj &tt&*>tMi am. - 

[0016] 15] jlJ6»ttOF Cffif«Of^fiS:«ia«?lwW:, 
04^-r«fc5*IIBOi»3ll«rffl*i"S. PC** 2 

(^fy^Sl) o PC*ffc2l*, B4i:StJ: 
7»w, Bttx-^^?>1ijR^7-f-yuKflt (afr-k 
m Sr*W3tBLte31{cJ:9tttai-5 (^f^S2) . 
-<ottffi»aiwJ:!9, 05l:/TtJ:5l:, PC** 2 
tt, ' ttffl Lfc* 7 w K* a ~ k S<7y®t4t^a^±^ 
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att, ^IC|i{g^^ foj iSC^tLTRBBL 

f 5j w?#^i^7^/uKt LTsa^r 
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tlcESiJL/cA*^ 7^ B^^ 

[0 0 1 8] JM£fft{Cte, PC#ft2l*. FF##tf& 
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B) tSr#KLT. UT<n J: b 4»*«H«:JHTL-C, 

6 Ufy/S4) o IP^, ttffibfc&7^-/UK#a 
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*o 
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LT(4TK4:7-f-yuK#a-g4-eS:9J»t, B*^ 
70>y*—?y ht ltt»7-f-/UK#hMA^ 
<i-f<T>7*—*y Yh "LT*aft7^— yuK* i — k S: 

5 0 PC**2I1, lt»8*fcL"Ctai47-f 

-A-K#<t LTfll£L/c7*-- 7>> MMRfc % FCif$a 

5) 0 
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-T'i^t^xrfiKtti z<n$z&m.&mmm> h h d 

D 3 Miry H-«J:5 4«rt-ctJ:v\ 

[0 0 2 2] 

»e«a«re«itttjKT-e#a. lot, Fctsafctfs 

fiKL-C«»-J-5<fr&^, a-WJ47-f-/uK#fe 
£T MR L fc 9 , * it tt#S»4 ^ 4 Kft & 8ft 1" 5 

3e*tttw*JROttfi»atc^S4FC««^a 

[0®<*>f»¥*tfcW] 

[Hi] *WA<nn&&mizt%&ir%OCRi'X J r2*<n 
[02] Piat»l-C7-{-yuK#07*-ey hlM 
[0 3] B3HS»»C§B«+*7^-/UK7*-^y h 

[04] Hlltt«ttOFCf^AttSK:ttffi-t-««X^^ 

[0 6] H3IJS»ffll<DFC**Offrt«3ltlllMr5fc 
i()(0(7)7U"ft- ho 

[0 7] ISll51fi*lB^7^-^K#o»^»a(cJ:9» 
[08] SeSOOCRv/^T^A-ettfflSnSilftR^^- 
[0 9] «*O0CR^xAlc:j3Jta7-f— ;WK#© 
[010] !6*O0CRS/^r-A«w*5lt5 7-f — /uKfr 
[011] 1*00 C R A lijsit 5 7 -f-/wK# 
[012] '&&<nOCRis*TMZ&\'tZ>7 4->\,yifc 

2-^-yt/^yfa-^*» (OCR»]»SS) 
3 - ^-K7-f *9 (HDD) 
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